46 research outputs found

    Cross-language Projection of Dependency Trees for Tree-to-tree Machine Translation

    Get PDF
    Syntax-based machine translation (MT) is an attractive approach for introducing addi-tional linguistic knowledge in corpus-based MT. Previous studies have shown that tree-to-string and string-to-tree translation mod-els perform better than tree-to-tree translation models since tree-to-tree models require two high quality parsers on the source as well as the target language side. In practice, high quality parsers for both languages are difficult to obtain and thus limit the translation quality. In this paper, we explore a method to transfer parse trees from the language side which has a high quality parser to the side which has a low quality parser to obtain transferred parse trees. We then combine the transferred parse trees with the original low quality parse trees. In our tree-to-tree MT experiments we have ob-served that the new combined trees lead to bet-ter performance in terms of BLEU score com-pared to when the original low quality trees and the transferred trees are used separately.

    MReD: A Meta-Review Dataset for Structure-Controllable Text Generation

    Full text link
    When directly using existing text generation datasets for controllable generation, we are facing the problem of not having the domain knowledge and thus the aspects that could be controlled are limited. A typical example is when using CNN/Daily Mail dataset for controllable text summarization, there is no guided information on the emphasis of summary sentences. A more useful text generator should leverage both the input text and the control signal to guide the generation, which can only be built with a deep understanding of the domain knowledge. Motivated by this vision, our paper introduces a new text generation dataset, named MReD. Our new dataset consists of 7,089 meta-reviews and all its 45k meta-review sentences are manually annotated with one of the 9 carefully defined categories, including abstract, strength, decision, etc. We present experimental results on start-of-the-art summarization models, and propose methods for structure-controlled generation with both extractive and abstractive models using our annotated data. By exploring various settings and analyzing the model behavior with respect to the control signal, we demonstrate the challenges of our proposed task and the values of our dataset MReD. Meanwhile, MReD also allows us to have a better understanding of the meta-review domain.Comment: 15 pages, 5 figures, accepted at ACL 202

    Spontaneous Liquid Crystal and Ferromagnetic Ordering of Colloidal Magnetic Nanoplates

    Get PDF
    Ferrofluids are familiar as colloidal suspensions of ferromagnetic nanoparticles in aqueous or organic solvents. The dispersed particles are randomly oriented but their moments become aligned if a magnetic field is applied, producing a variety of exotic and useful magneto-mechanical effects. A longstanding interest and challenge has been to make such suspensions macroscopically ferromagnetic, that is having uniform magnetic alignment in absence of a field. Here we report a fluid suspension of magnetic nanoplates which spontaneously aligns into an equilibrium nematic liquid crystal phase that is also macroscopically ferromagnetic. Its zero-field magnetization produces distinctive magnetic self-interaction effects, including liquid crystal textures of fluid block domains arranged in closed flux loops, and makes this phase highly sensitive, with it dramatically changing shape even in the Earth's magnetic field

    AgentBench: Evaluating LLMs as Agents

    Full text link
    Large Language Models (LLMs) are becoming increasingly smart and autonomous, targeting real-world pragmatic missions beyond traditional NLP tasks. As a result, there has been an urgent need to evaluate LLMs as agents on challenging tasks in interactive environments. We present AgentBench, a multi-dimensional evolving benchmark that currently consists of 8 distinct environments to assess LLM-as-Agent's reasoning and decision-making abilities in a multi-turn open-ended generation setting. Our extensive test over 27 API-based and open-sourced (OSS) LLMs shows that, while top commercial LLMs present a strong ability of acting as agents in complex environments, there is a significant disparity in performance between them and OSS competitors. We identify the typical reasons of failures in environments and LLMs, showing that poor long-term reasoning, decision-making, and instruction following abilities are the main obstacles for developing usable LLM agents. Training on code and high quality multi-turn alignment data could improve agent performance. Datasets, environments, and an integrated evaluation package for AgentBench are released at \url{https://github.com/THUDM/AgentBench}.Comment: 55 page

    Sciences of the USA 1418 -1421 ͉ PNAS

    Get PDF
    The discovery of the block-like structure of linkage disequilibrium (LD) in human populations holds the promise of delineating the etiology of common diseases. However, understanding the magnitude, mechanism, and utility of between-population LD sharing is critical for future genome-wide association studies. In this study, substantial LD sharing between six non-African populations was observed, although much less between African-American and non-African, based on 20,000 SNPs of chromosome 21. We also demonstrated the respective roles of recombination and demographic events in shaping LD sharing. Furthermore, we showed that the haplotype-tagged SNPs chosen from one population are portable to the others in East Asia. Therefore, we concluded that the magnitude of LD sharing between human populations justifies the use of representative populations for selecting haplotypetagged SNPs in genome-wide association studies of complex diseases. bottleneck ͉ genetic distance ͉ association study ͉ common disease ͉ genetic variant C omprehensive testing of the association between genetic variations in the human genome and common diseases holds the promise of delineating the genetic architecture of these diseases (1-5). Substantial sharing of the boundaries and specific haplotypes of linkage disequilibrium (LD) blocks between populations was observed (6). However, variations of haplotype and LD across populations were also reported, raising concerns on its practical hindrance for genomewide testing of association (7-9). Conflicting observations on the magnitude of LD sharing between human populations, therefore, call for a careful examination of the following three questions, which are fundamental in developing strategies for genomewide testing of association. First, measurement of LD sharing between populations should be made independent of the definition of LD blocks, which introduce inconsistent block boundaries (10). Second, the mechanisms that shape LD sharing between populations are yet to be fully explored although the roles of recombination hotspots and demographic events have been implicated To address the aforementioned questions, we typed Ͼ20,000 SNPs on chromosome 21 in seven populations: three representative continental populations [African-American (AFR), European (EUR), and Han Chinese (HAN)] and four other major East Asian (EA) populations. This design allows a close examination of LD sharing between continental groups as well as those within East Asia. In this report, we measured the LD sharing between populations independent of the definition of LD block; and we showed that bottleneck events play a critical role in shaping the LD sharing between Africans and nonAfricans, but much less so between non-Africans. An important question for applying HapMap results to disease studies is how tagSNPs selected from a HapMap population will be ported to disease studies performed in other populations. In this study, we showed that tagSNPs selected from representative continental populations are indeed portable to the others in the same continent for association studies, at least in East Asia, with reasonable efficiency. In addition, we proposed a simple guideline that allows a quick evaluation of the portability of tagSNPs between populations by typing a small number of SNPs. Results Overall 26,112 SNPs were selected and typed in this study, and the data from 19,060 SNPs passed the quality control criteria and were used for further analyses. The SNPs and quality control criteria for SNP selection are described in Materials and Methods. Seven world populations, including EUR, AFR, and five EA populations, were studied. The five EA populations, i.e., HAN, Miao (HMJ), Zhuang (CCY), Wa (WBM), and Uighur (UIG), represent five major linguistic families spoken in East Asia. Preservation of LD between populations, i.e., LD sharing (S, or S AB when the population A was given as reference), is measured by the proportion of SNP pairs in LD in one population (population A or the reference) that are also in LD in another (population B). In this study, LD sharing was estimated without invoking the inference of haplotype blocks; therefore, the measure is independent of the definition of haplotype blocks. LD between two loci was measured in r 2 (16). Detail for the measure of LD sharing is described in Materials and Methods. LD sharing between EAs ranges from 63-74% for r 2 Ն 0.1 and 70-84% for r 2 Ն 0.5 (se

    Dusky-like Is Critical for Morphogenesis of the Cellular Protuberances and Formation of the Cuticle in <i>Henosepilachna vigintioctopunctata</i>

    No full text
    Dusky-like (Dyl) is a transmembrane protein containing a zona pellucida domain. Its physiological roles during metamorphosis have been well explored in Drosophila melanogaster and have also been documented in Tribolium castaneum. However, Dyl has undergone a functional shift between Diptera and Coleoptera insects. Further investigation of Dyl in other insects will be helpful to further clarify its function in insect growth and development. Henosepilachna vigintioctopunctata is an important Coleoptera that causes enormous economic losses in agriculture in China. In this study, we found that the expression of Hvdyl was detectable in embryos, larvae, prepupae, pupae, and adults. We knocked down Hvdyl in third- and fourth-instar larvae and pupae with RNA interference (RNAi). RNAi of Hvdyl mainly caused two phenotypic defects. Firstly, the growth of epidermal cellular protuberances was suppressed. Injection of dsdyl (double-stranded dusky-like RNA) at the third-instar larval stage truncated the scoli throughout the thorax and abdomen and shortened the setae on the head capsules and mouthparts of the fourth-instar larvae. Introduction of dsdyl at the third- and fourth-instar stages led to misshapen pupal setae. The setae were shortened or became black nodules. Treatment with dsdyl at the larval and pupal stages resulted in deformed adults with completely suppressed wing hairs. Moreover, the knockdown of Hvdyl at the third-instar stage caused deformed larval mouthparts at the fourth-instar period. As a result, foliage consumption was inhibited, and larval growth was slowed. The results indicate that Dyl is associated with the growth of cellular protuberances throughout development and with the formation of the cuticle in H. vigintioctopunctata
    corecore